An Exploratory Study of the Malay Text Processing Tools in Ontology Learning
نویسندگان
چکیده
This paper discusses the overall process of learning taxonomy from Malay texts using unsupervised conceptual clustering approach and investigates the existing Malay NLP tools as potential pre-processing tools for the proposed ontology learning approach. The tools are a maximum-entropy parser based on open NLP package, a word sense tagger and a parser based on pola grammar. A case study approach is adopted in this study deemed suitable for exploratory research. The result for each NLP tool shows a lower recall and precision. The poor result is caused by several factors such as the texts being used in this experiment.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملA Hybrid Approach for Learning Concept Hierarchy from Malay Text Using GAHC and Immune Network
The human immune system provides inspiration in the attempt of solving the knowledge acquisition bottleneck in developing ontology for semantic web application. In this paper, we proposed an extension to the Guided Agglomerative Hierarchical Clustering (GAHC) method that uses an Artificial Immune Network (AIN) algorithm to improve the process of automatically building and expanding the concept ...
متن کاملSemantic Similarity Measures for Malay Sentences
The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participat...
متن کامل